Data-Driven Detection of Malicious Document PhD Thesis Proposal

نویسندگان

  • Wei-Jen Li
  • Salvatore J. Stolfo
چکیده

Malcode hidden in otherwise normal appearing public documents provide both convenient and stealthy means for attackers to penetrate systems. By exploiting the ubiquitous and object-oriented approach of modern document applications and formats, malcode can reach third-party applications that may harbor exploitable vulnerabilities otherwise unreachable by network-level service attacks: by clicking a web link of any arbitrary website or even inserting media such as CD-ROMS and USB drives, a target system can be infected when a document is opened, effectively bypassing all the network firewalls and sensors. To protect against this stealthy threat, this thesis investigates the ability to detect embedded malcode in Word documents using two techniques: static content analysis using statistical models of typical document content and run-time dynamic tests. The first approach parses Word documents into its constituent objects and identifies the “sections” of the binary file content of those objects. The Anagram sensor introduced for network packet anomaly detection and an Entropy analysis technique are applied for this purpose. The second approach uses a series of dynamic tests on diverse platforms to open a document and execute its embedded malcode in diverse environments forcing the malcode to behave abnormally leading to its detection. The intent of such code to do harm is detected by observing unexpected changes it makes to the underlying platforms when it is able to execute. In addition to simply observing the system, a few techniques used to monitor virtual machines and detect anomaly are proposed in this document. A formal study using thousands of infected Word documents gathered from the wild has been performed, and several deficiencies of both approaches are identified, representing both challenges in addressing the problem and opportunities for follow on research. This document presents a twenty-four month research program for developing the major components of the proposed approaches, including a hybrid system that applies both techniques and for completing the experiments and evaluations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Monte Carlo Semantics: Robust Inference and Logical Pattern Processing Based on Integrated Deep and Shallow Semantic Representations

This document was submitted to the University of Cambridge Computer Laboratory as part of the documentation required by first year PhD candidates comprising a thesis proposal (Bergmair, 2007a) and a first year report (this document). In addition, a thesis draft (Bergmair, 2007b) has been submitted to supplement the required material. – For readers other than the examiners of this PhD project, i...

متن کامل

Data-driven Paraphrasing and Stylistic Harmonization

This thesis proposal outlines the use of unsupervised data-driven methods for paraphrasing tasks. We motivate the development of knowledge-free methods at the guiding use case of multi-document summarization, which requires a domain-adaptable system for both the detection and generation of sentential paraphrases. First, we define a number of guiding research questions that will be addressed in ...

متن کامل

تبیین انتظارات اساتید دانشگاه علوم پزشکی ایران از دانشجویان دکتری در روند انجام رساله

Background: Knowing the expectations of supervisors may affect the quality of graduate students' theses. The aim of this study was to explore expectations of supervisors from Ph.D students in the process of performing Ph.D thesis as a qualitative content analysis design (conventional method). Methods: This qualitative study was conducted on 25 supervisor of Iran University of Medical Science...

متن کامل

تبیین انتظارات دانشجویان دکتری از اساتید راهنما در انجام رساله دکتری: یک تحلیل محتوای کیفی

  Introduction: Quality of research in PhD programs increases if supervisors become aware of students' expectations from them. This qualitative study aimed to explore expectations of PhD students from their supervisors was done.   Methods: This qualitative content analysis study was conducted on 22 graduated PhD students of Iran University of Medical Sciences, in 2014. The samples were purposef...

متن کامل

Analyzing new features of infected web content in detection of malicious web pages

Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006